largest eigenvalue
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (4 more...)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
Why Some Models Resist Unlearning: A Linear Stability Perspective
Machine unlearning, the ability to erase the effect of specific training samples without retraining from scratch, is critical for privacy, regulation, and efficiency. However, most progress in unlearning has been empirical, with little theoretical understanding of when and why unlearning works. We tackle this gap by framing unlearning through the lens of asymptotic linear stability to capture the interaction between optimization dynamics and data geometry. The key quantity in our analysis is data coherence which is the cross sample alignment of loss surface directions near the optimum. We decompose coherence along three axes: within the retain set, within the forget set, and between them, and prove tight stability thresholds that separate convergence from divergence. To further link data properties to forgettability, we study a two layer ReLU CNN under a signal plus noise model and show that stronger memorization makes forgetting easier: when the signal to noise ratio (SNR) is lower, cross sample alignment is weaker, reducing coherence and making unlearning easier; conversely, high SNR, highly aligned models resist unlearning. For empirical verification, we show that Hessian tests and CNN heatmaps align closely with the predicted boundary, mapping the stability frontier of gradient based unlearning as a function of batching, mixing, and data/model alignment. Our analysis is grounded in random matrix theory tools and provides the first principled account of the trade offs between memorization, coherence, and unlearning.
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- (7 more...)
A Experimental Setup
We provide detailed experimental setup in this section. We empirically show that even a simple linear network can enter the EOS regime. Fully-connected Network We conduct further experiments on several different fully-connected networks with 4 hidden layers with various activation functions. For example, the structure of a fully-connected tanh network is shown in Table 2. Table 2: Fully-connected network Layer # Name Layer In shape Out shape 1 Flatten() (3, 32, 32) 3072 2 fc1 nn.Linear(3072,200,bias=False) 3072 200 3 nn.tanh() Like the fully-connected network experiments, we consider tanh, ReLU and ELU activations.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
On Theoretical Interpretations of Concept-Based In-Context Learning
Tang, Huaze, Peng, Tianren, Huang, Shao-lun
In-Context Learning (ICL) has emerged as an important new paradigm in natural language processing and large language model (LLM) applications. However, the theoretical understanding of the ICL mechanism remains limited. This paper aims to investigate this issue by studying a particular ICL approach, called concept-based ICL (CB-ICL). In particular, we propose theoretical analyses on applying CB-ICL to ICL tasks, which explains why and when the CB-ICL performs well for predicting query labels in prompts with only a few demonstrations. In addition, the proposed theory quantifies the knowledge that can be leveraged by the LLMs to the prompt tasks, and leads to a similarity measure between the prompt demonstrations and the query input, which provides important insights and guidance for model pre-training and prompt engineering in ICL. Moreover, the impact of the prompt demonstration size and the dimension of the LLM embeddings in ICL are also explored based on the proposed theory. Finally, several real-data experiments are conducted to validate the practical usefulness of CB-ICL and the corresponding theory. With the great successes of large language models (LLMs), In-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP) (Brown et al., 2020; Chowdhery et al., 2023; Achiam et al., 2023), where LLMs addresses the requested queries in context prompts with a few demonstrations.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (4 more...)